arXiv:1501.01366vl [math.ST] 7 Jan 2015 


Sequential Design for Computerized 
Adaptive Testing that Allows for 
Response Revision 

Shiyu Wang * , Georgios Fellouris ' and Hua-Hua Chang 1 

Abstract: In computerized adaptive testing (CAT), items (questions) 
are selected in real time based on the already observed responses, so that 
the ability of the examinee can be estimated as accurately as possible. 

This is typically formulated as a non-linear, sequential, experimental 
design problem with binary observations that correspond to the true 
or false responses. However, most items in practice are multiple-choice 
and dichotomous models do not make full use of the available data. 
Moreover, CAT has been heavily criticized for not allowing test-takers 
to review and revise their answers. In this work, we propose a novel 
CAT design that is based on the polytomous nominal response model 
and in which test-takers are allowed to revise their responses at any time 
during the test. We show that as the number of administered items goes 
to infinity, the proposed estimator is (i) strongly consistent for any item 
selection and revision strategy and (ii) asymptotically normal when the 
items are selected to maximize the Fisher information at the current 
ability estimate and the number of revisions is smaller than the number 
of items. We also present the findings of a simulation study that supports 
our asymptotic results. 
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1. Introduction 

A main goal in educational assessment is the accurate estimation of each test- 
taker’s ability, which is a kind of latent trait. In a conventional paper-pencil 
test, this estimation is based on the examinee’s responses to a preassembled 
set of items. On the other hand, in Computerized Adaptive Testing (CAT), 
items are selected in real time, i.e., the next item depends on the already 
observed responses. In this way, it is possible to tailor the difficulty of the 
items to the examinee’s ability and estimate the latter more efficiently than 
that in a paper-pencil test. This is especially true for examinees at the two 
extreme ends of the ability distribution, who may otherwise receive items 
either too difficult or too easy. CAT was originally proposed by Lord [2] and 
with the rapid development of modern technology it has become popular 
for many kinds of measurement tasks, such as educational testing, patient 
reported outcome, and quality of life measurement. Examples of large-scale 
CATs include the Graduate Management Admission Test (GMAT), the Na¬ 
tional Council Licensure Examination (NCLEX) for nurses, and the Armed 
Services Vocational Aptitude Battery (ASVAB) [4]. 

The two main tasks in a CAT, i.e., ability estimation and item selection, 
depend heavily on Item Response Theory (IRT) for modeling the response 
of the examinee. This is done by specifying the probability of a correct 
answer as a function of certain item-specific parameters and the ability level, 
which is represented by a scalar parameter 9. For example, in the two- 
parameter logistic (2PL) model, the probability of a correct answer is equal 
to H(a(9 — b )), where H(x ) = e x /(I + e x ). The item parameters for this 
model are the difficulty parameter b and the discrimination parameter c. 
The 2PL is an extension of the Rasch model [20], which corresponds to the 
special case that a = 1. On the other hand, the 2PL can be generalized 
by adding a parameter that captures the probability of guessing the right 
answer (3PL model). 

Given the IRT model, a standard approach for item selection, proposed 
by Lord [17], is to select the item that maximizes the Fisher information of 
the model at each step. For the above logistic models, this item selection 
procedure suggests selecting the item with difficulty parameter b equal to 9. 
Since 9 is unknown, this implies that the difficulty parameter for item i , bi, 
should be equal to 9%- 1 , the estimate of 9 based on the first i—1 observations. 
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As it was suggested by Wu [25, 26], the adaptive estimation of 9 can be 
achieved via a likelihood-based approach, instead of the non-parametric, 
Robbins-Monro [18] algorithm that had been originally proposed by Lord 
[13] and can be very inefficient with binary data [12]. When 9i is selected 
to be the Maximum Likelihood Estimator (MLE) of 6 based on the first i 
observations, the resulting final ability estimator was shown to be strongly 
consistent and asymptotically normal by Ying and Wu [24] under the Rasch 
model and Chang and Ying [5] under the 2PL and the 3PL models. 

However, while the design and analysis of CAT in the educational and 
statistical literature typically assumes dichotomous IRT models, most oper¬ 
ational CAT programs employ multiple-choice items, for which dichotomous 
models are unable to differentiate among the (more than one) incorrect an¬ 
swers. This implies a loss of efficiency that could be avoided if a polytomous 
IRT model, such as Bock’s nominal response model [2], was used instead. 
Indeed, based on a simulation study, de Ayala [6] found that a CAT based 
on the nominal response model leads to a more accurate ability estimator 
than a CAT that is based on the 3PL model. However, to our knowledge, 
there has not been any theoretical support to this claim. In fact, general¬ 
izing the results in [5] and [24] in the case of the nominal response model 
is a very non-trivial problem, since for items with m > 2 categories there 
are 2 (m — 1) parameters need to be selected at each step and there is no 
convenient, explicit form for the item parameters that maximize the Fisher 
information. 

Our first contribution is that we study theoretically the design of a CAT 
that is based on the nominal response model with an arbitrary number of 
categories. Specifically, assuming that the response are conditionally inde¬ 
pendent given the selected items and that the item parameters belong to a 
bounded set, we prove (Theorem 3.1) that the MLE of 6 (with any item se¬ 
lection strategy) is strongly consistent as the number of administered items 
goes to infinity. If additionally each item is selected to maximize the Fisher 
information at the current MLE of the ability level, we show that the MLE 
of 9 becomes asymptotically normal and efficient (Theorem 3.2). The signifi¬ 
cance of our first work is the design of a CAT that is based on the polytomous 
nominal response model using the full capacity of multiple-choice items, in 
comparison to a dichotomous model that wastes information by treating 
them as binary (true/false). 
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Our second main contribution in this work is that we show that a CAT 
design with the nominal response model can be used to alleviate the major 
criticism that is addressed to CAT: the fact that test-takers are not allowed 
to review and revise their answers during the test. Indeed, it is commonly 
believed that revision conflicts with the adaptive nature of CAT, and, hence, 
decreases efficiency and leads to biased ability estimation [20, 21, 22, 23]. 
Thus, none of the currently operational CAT programs allows for response 
revision, which is allowed by the traditional paper-pencil tests. This has 
become a main concern for both examinees and testing companies, and for 
this reason some test programs have decided to switch from CAT to other 
modes of testing [15]. 

On the other hand, it is clear that the response revision feature can pro¬ 
vide a more user-friendly environment, by helping alleviate the test-takers’ 
anxiety. It may even lead to a more reliable ability estimation, by reducing 
the measurement error that is associated with careless mistakes (that the 
examinees may correct). Therefore, it has been a long-standing problem to 
incorporate the response revision feature in CAT. Certain modified designs 
have been proposed for this purpose, such as CAT with restricted review 
models [20] and multistage adaptive testing [15], and it has been argued 
that if appropriate review and revision rules are set, there will be no impact 
on the estimation accuracy and efficiency [11, 23]. However, all these stud¬ 
ies (that either support or oppose response revision in CAT) rely on Monte 
Carlo simulation experiments and lack a theoretical foundation. 

In this work, we propose a CAT design that allows for response revision 
and we establish its asymptotic properties under a rigorous statistical frame¬ 
work. Specifically, assuming that we have multiple-choice items with m > 3 
categories, our main idea is to exploit the flexibility of the nominal response 
model in order to obtain an algorithm that gives partial credit when the 
examinee corrects a previously wrong answer. Moreover, our setup for revi¬ 
sion is very flexible: each examinee is allowed to revise a previous answer 
at any time during the test as long as each item is revised at most m — 2 
times. However, this leads to a non-standard experimental design problem 
which differs from the traditional CAT setup in two ways. First, items need 
to be selected at certain random times, which are determined by the exami¬ 
nee. Second, information is now accumulated at two time-scales: that of the 
observations/ responses and that of the items. 


Wang, Fellouris and Chang/Design for CAT that allows for response revision 5 


In order to address this problem, we assume, as in the context of the stan¬ 
dard CAT, that responses from different items are conditionally independent 
and that the nominal response model governs the first response to each item. 
However, we now further assume that whenever an item is revised during 
the test, the new response will follow the conditional pmf of the nominal 
response model given that previous answers cannot be repeated. Our final 
ability estimator is the maximizer of the conditional likelihood of all obser¬ 
vations (first responses and revisions) given the selected item parameters 
and the observed decisions of the examinee to revise or not at each step. We 
show (Theorem 4.1) that this estimator is strongly consistent for any item 
selection and revision strategy. When in particular the items are selected 
to maximize the Fisher information of the nominal response model at the 
current ability estimate and, additionally, the number of revisions is ’’small” 
relative to the number of items, we show that the proposed estimator is also 
asymptotically normal, with the same asymptotic variance as that in the 
regular CAT (Theorem 4.2). 

From a practical point of view, the most important feature of our approach 
is that it incorporates revision without the need to calibrate any additional 
item parameters than the ones used in a regular CAT that is based on the 
nominal response model. Indeed, if a dichotomous IRT model was employed 
instead, incorporating revision would require calibrating the probability of 
switching from a correct answer to a wrong answer and vice-versa for all 
items in the pool. This is a very difficult task in practice and probably 
infeasible for large-scale implementation. 

The rest of the paper is organized as follows. In Section 2, we introduce 
the nominal response model and its main properties. In Section 3, we focus 
on the design and asymptotic analysis of a regular CAT that is based on 
the nominal response model. In Section 4, we formulate the problem of CAT 
design that allows for response revision, we present the proposed scheme and 
establish its asymptotic properties. In Section 5, we present the findings of 
a simulation study that illustrates our theoretical results. We conclude in 
Section 6. 
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2. Nominal Response Model 


In this section, we introduce the nominal response model, which is the IRT 
model that we will use for the design of CAT in next sections. Throughout 
the paper, we focus on the case of a single examinee, whose ability is quan¬ 
tified by a scalar parameter 6 6 M that is the quantity of interest. Thus, the 
underlying probability measure is denoted by Pq. 

Let X be the response to a generic multiple-choice item with m > 2 
categories. That is, X = k when the examinee chooses category k, where 
1 < k < m, and the nominal response model assumes that 


P e {X = k) 


exp (a k 9 + Cfc) 
XX=i exp (a h 0 + c h ) ’ 


1 < k < m, 


( 2 . 1 ) 


where {a k , c k }i< k < m are real numbers that satisfy 


^|a fc |/0 and ^|c fc |/0 (2.2) 

k= 1 k= 1 

and the following identihability conditions: 

m m 

^2 a k = ^2c k = 0. (2.3) 

k=l k =1 

The latter assumption implies that one of the a k s and one of the c k s is 
completely determined by the others. As a result, without loss of generality 
we can say that the distribution of X is completely determined by the ability 
parameter 6 and the vector b := (< 22 ,..., a m , C 2 , •.., c m ). In order to simplify 
the notation we will write: 


p k {0\ b) := Pg(X = k), 1 <k<m. (2.4) 

Note that in the case of binary data (m = 2), the nominal response model 
recovers the 2PL model with discrimination parameter 2|ai| and difficulty 
parameter — 02 / 02 - In particular, (2.3) implies ai = — 02 , ci = —C 2 so that 

exp(2a 2 0 + 2 c 2 ) 

1 + exp(2a 2 0 + 2c 2 ) ’ 


P2(0; b) = 1 -pi(0;b) 
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The log-likelihood and the score function of 9 take the form 

m 

b, X) := log Pq(X) = ^ log (p k (9; b)) 1 {x =k} 

k =1 

d m 

s(0 ; b,X) := —£(9;b,X) = ^ [a k - d(9; b)] 1 {x =k}, 

k =1 

where a(@; b) is the following weighted average of the a k s: 

m 

a(0; b) := ^cthPhiO; b). 


h =1 


The Fisher information of X as a function of 9 takes the form: 

a k - a(9; b)^' 

fc=i 


(2.5) 

( 2 . 6 ) 


(2.7) 


171 2 

J(6»; b) := Var 0 [s(6»; b,X)] = - a(6»; b)j p k (6; b), (2.8) 


whereas the derivative of s(9; b, A") with respect to 9 does not depend on X 
and is equal to — J{9 ; b), which justifies the following notation: 


s'(9- b) :=^s(e-b,X) _ = -J(0;b). 

d9 e=e 


(2.9) 


Moreover, J(9 ; b) is positive and has an upper bound that is independent of 
9, in particular, 


m m 

0 < J (9] b) <^2a 2 k p k (9:b) <^a 2 k < ma*( b), (2.10) 

k =1 k =1 

where we denote a*(b) and a*(b) as the maximum and minimum of the a k s 
respectively, i.e.. 

a*(b) := max a k and a*(b) := min a k . 

1 <k<m 1 <k<m 

The first inequality holds in (2.10) because the a k s cannot be identical, due 
to (2.2)-(2.3). However, while for any given 9 E M and b we have a*(b) < 
d{9\ b) < a*(b), from (2.1) it follows that a{9\ b) —> a*(b) as 9 —> —oo and 
d{9\ b) = a*(b) as 9 —> +oo and, consequently, 


lim J(0;b)=O, 

|0|->oo 


( 2 . 11 ) 
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i.e. the Fisher information of the model goes to 0 as the ability level goes to 
±oo for any given item parameter. 

Since in practice items are drawn from a given item bank, we will as¬ 
sume that b takes value in a compact subset B of M 2m-2 , a rather realistic 
assumption whenever we have a given item bank. This assumption will be 
technically useful through the following result (Maximum Theorem), whose 
proof can be found, for example, in [19], p. 239. 

Lemma 1. //g:KxB-)ljsa continuous function, then sup bgB g{-, b) 
and inf'beB <?(-, b) are also continuous functions. Thus, if x n —X xq, then 
su PbeB 1 9{x n , b ) - g(x o, b)| -x 0. 

As a first illustration of this result, note that since J{9\ b) is jointly con¬ 
tinuous, then 

0 —> J*(9) '■= inf J(0\ b) and 9 -A J*(9) := sup J(9; b) (2.12) 

beB beB 

are also continuous functions. Moreover, from Lemma 1 and (2.10) it follows 
that there is a universal in 9 upper (but not lower) bound on the Fisher 
information that corresponds to each ability level, i.e., 

0 < J*(6) < J*{9) < K := m sup (a*(b)) 2 , V6 »gM. (2.13) 

beB 

3. Design of standard CAT with Nominal Response Model 
3.1. Problem formulation 

In this section we focus on the design of a CAT with a fixed number of items, 
n, each of which has m > 2 categories. Let X\ denote the response to item i, 
thus, Xi = k if the examinee chooses category k in item i, where 1 < k < m 
and 1 < i < n. We assume that the responses are governed by the nominal 
response model, defined in (2.1), so that 

Pg(Xi = k) := p k (9]bi), l<k<m,l<i<n, (3.1) 

where 9 is the scalar parameter of interest that represents the ability of 
the examinee and b, := (a®, ■ ■ ■, ctim , c, 2 ,..., Q m ) is a B-valued vector that 
characterizes item i and satisfies (2.2)-(2.3). Moreover, we assume that the 
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responses are conditionally independent given the selected items, in the sense 
that 

i 

P 0 (Xi,... ,Xi I bi,... ,bj) = Pe(Xj\bj), 1 <i<n. (3.2) 

3 = 1 

However, while in a conventional paper-pencil test the parameters bi,..., b.„ 
are deterministic, in a CAT they are random , determined in real time based 
on the already observed responses. Specifically, let J-f' be the information 
contained in the first i responses, i.e., := a(X \,...., X{). Then, each 

bj+i is an fF* -measurable, B-valued random vector and, as a result, despite 
assumption (3.2), the responses are far from independent and, in fact, they 
may have a complex dependence structure. 

The problem in CAT is to find an ability estimator , 6 n , at the end of the 
test, i.e., an J-j^-measurable estimator of 9 , and an item selection strategy , 
(bj + i)i<j< n _i, so that the accuracy of 9 n be optimized. If we were able to 
select each item i so that J(9;bi) = J*(9), where J*(9) is the maximum 
Fisher information an item can achieve (recall (2.12)) at the true ability 
level 9, then we could use standard asymptotic theory in order to obtain 
an estimator, 9 n , such as the MLE, that is asymptotically efficient, in the 
sense that \/n(9 n — 9) -> N (0, [J*(0)] _1 ) as n —> oo. Of course, this is not a 
feasible item selection strategy, as it requires knowledge of 9, the parameter 
we are trying to estimate! Nevertheless, we can make use of the adaptive 
nature of CAT and select items that maximize the Fisher information at the 
current estimate of the ability level. That is, bj+i can be chosen to belong 
to 

argrnax J(0*; b), (3.3) 

beB 

where 9 t is an estimate of the ability level that is based on the first i re¬ 
sponses, 1 < i < n. 

We should note that this item selection method assumes that each bj can 
take any value in B. Of course, this is not the case in practice, where a given 
item bank has a finite number of items and there are restrictions on the 
exposure rate of the items [3]. Nevertheless, this item selection strategy will 
provide a benchmark for the best possible performance that can be expected, 
at least in an asymptotic sense. 
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3.2. Adaptive Maximum Likelihood Estimation of 6 

The item selection strategy (3.3) calls for an adaptive estimation of the ex¬ 
aminee’s ability during the test process. From the conditional independence 
assumption (3.2) it follows that the conditional log-likelihood function of 
the first i responses given the selected items takes the form 

i 

Li{6) := log P o{Xi ,..., Xilbr, • • •, b i) = ^2 W h L X j)i 

3 = 1 

where £(9; b j,Xj) is the log-likelihood that corresponds to the j th response 
and is determined by the nominal response model, according to (2.5). Then, 
the corresponding score function takes the form 

S i (e):=^L i (e) = Y / s(9-,b J ,X j ), (3.4) 

3 = 1 

where s(9; b j,Xj) is the score function that corresponds to the j th item and 
is defined according to (2.6). We would like our estimate for 6 after the first 
i observations to be the root of Si(0). Unfortunately, this root does not exist 
for every 1 < i < n. Indeed, Si(9) does not have a root when all acquired 
responses either correspond to the category with the largest a-value, or to 
the category with smallest a-value. In other words, the root of S t (9) exists 
and is unique for every i > no, where 

no := max ji e {1,..., n} :Xj = argmax{ajfc}^ =1 Vj < i 

or Xj <E argmin {a jk }™ =1 Vj < z|, 

For example, in a CAT with n = 7 items of m = 4 categories where for each 
item the largest (resp. smallest) a-value is associated with category 4(resp. 
1), for the sequence of responses 1,1,1, 3,4,1,3 we have no = 3. 

For i < no, an initial estimation procedure is needed to estimate the 
ability parameter. A possible initialization strategy is to set 9q = 0 and, for 
every i < no, 9i = 0,;_i + d (resp. 9i = 9i-\ — d) if the acquired responses have 
the largest (resp. smallest) a-value, whereas d is a predetermined constant. 
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3.3. Asymptotic Analysis 


We now focus on the asymptotic properties of 9 n as n —> oo, thus we will 
assume without loss of generality that 9 n is the root of S n (9) for sufficient 
large values of n. Specifically, we will establish the strong consistency of 9 n 
for any item selection strategy and its asymptotic normality and efficiency 
when the information maximizing item selection (3.3) is adopted. Both prop¬ 
erties rely heavily on the martingale property of the score function, S n (9), 
which is established in the following proposition. 


Proposition 1. For any item selection strategy, (b n ) ne pj, the score process 
{5 n (0)} ne N is a (Pfl, {F n } n ^f)-martingale with bounded increments, mean 0 
and predictable variation (S(9)) n = I n (9), where 

n 

In(0):=J2j{e-,bi), neN, (3.5) 

2=1 


and J{9 ; b;) is the Fisher information of the i th item, defined in (2.8). More¬ 
over, for any 9 we have 


S n {9) ■= Tq S{9) 


o=e 


- = -i„(0). 


(3.6) 


Proof. For any n E N, 


Sn(9) - Sn-!(9) = 8(9; b n ,X n ) = J2(ank - dn(9 ; b n ))t {Xn=k} . (3.7) 

k= 1 


Therefore, \S n (9) — 5 n _i(0)| < 2K for every n £ N. Moreover, since b n is an 
J- n -i -measurable random vector, it follows directly from (2.6) 


E e [S n (0) - Sn-iWI-Fn-i] = E e [ s (0; b n ,X n )\F n -!} = 0, 

which proves the martingale property of S n (9). Next, from (2.8) it follows 
that 


E e [(S n (0)-S n _l(0)) 2 |J-n-l] 


E e[s 2 (9-, b n ,X n )\Xn-i] = J(9;b n ), 


which proves that ( S(9)) n = X^=i 6j). Finally, from (2.9) it follows that 
for any 9 we have 


S' n (0) 


tP ■<« 


0=0 


^-J(@;b 4 ) 

2=1 


~In(9), 


which completes the proof. 


□ 
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With the next theorem we establish the strong consistency of 9 n for any 
item selection strategy. 

Theorem 3.1. For any item selection strategy, as n -A oo we have 


9n t 9 


and 


Inik) 

In(9) 


Pg — a.s. 


(3.8) 


Proof. Let (b„) ne ^ be an arbitrary item selection strategy. From Proposi¬ 
tion 1 it follows that S n (9) is a Pg-martingale with mean 0 and predictable 
variation I n (9) > nJ*(0) —> oo, since J*(9) > 0. Then, from the Martin¬ 
gale Strong Law of Large Numbers (see, e.g., [27], p. 124), it follows that as 
n —> oo 


Sn(d) 

In(9) 


Pg — a.s.. 


(3.9) 


From a Taylor expansion of S n {9) around 9 n it follows that there exists some 
9 n that lies between 9 n and 9 so that 


0 = S n (9 n ) = S n (9) + S' n {9 n ){9n ~9) = S n (9 ) - I n (9 n )(9 n - 9 ), (3.10) 


where the second equality follows from (3.6). From (3.9) and (3.10) we then 
obtain 

L^L(9„-9)->0 P„-a.s. 

ln\y) 

The strong consistency of 9 n will then follow as long as we can guarantee 
that the fraction in the last relationship remains bounded away from 0 as 
n —>■ oo. However, for every n we have 

In{9n) _ ££=1 J(e n -,bi) ^ nJ*(9 n ) _ J*(0 n ) 

W) E?=i bi) - nj*(9) J*(9) ' 

Since J*(9) > 0 , it suffices to show that P 0 (liminf n J*(9 n ) > 0 ) = 1 . Since 
J*(0) is continuous, positive and bounded away from 0 when \9\ is bounded 
away from infinity (recall (2.11)) and 9 n lies between 9 n and 9, it suffices to 
show that 


P0(limsup \ 9 n \ > 0) = 1. 

n 


(3.11) 
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In order to prove (3.11), we observe first of all that since S n (9 n ) = 0 for 
large n, (3.9) can be rewritten as follows: 


s n (o) - s n {e n ) 

ln(0) 


0 Pa — a.s. 


(3.12) 


But for every n we have I n (9) < nJ*(6) and 

n 

s n {e) - s n {9 n ) = [*(0; bi, Xi) - s ( 0 n ; b i,Xi) 


i =1 


n 

Y' \a(9 n \ bj) - a(9 ; bj)l > n inf \a(9 n ; b) - a(9; b) 
L J beB L 


i=l 


therefore we obtain 


Sn(0) - S n 0 n ) > b ) _ b ). 


4W 


J*(0) 


(3.13) 


On the event {liirisup n 9 n = oo} there exists a subsequence (0 n .) of (9 n ) 
such that 9 n . -A oo. Consequently, for any b E B we have 


lim 

n,-> oo 


a(9 nj \ b) — a(0; b) = a*(b) — a(9; b) > 0. 


(3.14) 


Since a*( b) — a(9; b) is jointly continuous in 9 and b, from Lemma 1 we 
obtain 


lim inf inf 

nj-to o beB 


a(9 nj ; b) — a{9\ b) > inf [a{9\ b) — a{9\ b)] > 0. (3.15) 

From (3.13) and (3.15) it follows that 


S n (9) -S n .(9 n .) 

lim inf -^12 — > o 

0° Inj \ 9) 

and comparing with (3.12) we conclude that Pfl(lim sup n 9 n = oo) = 0. In an 
identical way we can show that Pg(liminf n 9 n = —oo) = 0, which establishes 
(3.11) and completes the proof of the strong consistency of 9 n . In order to 
prove the second part of (3.8), we observe that 

|/„(9„) - 4(0)1 < _J_ rr £ |J(0„; b.) - J(0; bi)| 


4(0) 


< 


nJ*{9) ^ 

sup \ J(9 n \ b) - J(9; b)|. 
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But since J(0; b) is jointly continuous and 9 n strongly consistent, from 
Lemma 1 it follows that the upper bound goes to 0 almost surely, which 
completes the proof. □ 

While the strong consistency of 9 n could be established for any item selec¬ 
tion strategy, its asymptotic normality and efficiency requires the information- 
maximizing item selection strategy (3.3). 

Theorem 3.2. If the information-maximizing item selection strategy (3.3) 
is used, then 9 n is asymptotically normal as n —> oo, since 

\fljJn) 0 n - 9) -> A7(0,1) (3.16) 

and asymptotically efficient, in the sense that 

V^{0n-9)^M{0,[J*(9)}- 1 ). (3.17) 

Proof. We will denote {b;}i<j< n as the information-maximizing item selec¬ 
tion strategy (3.3). We will start by showing that as n —> oo 

1 n 

-I n {9) = y^J{9-b i )^ J*{9) Pg — a.s. (3.18) 

n 1 ' 

2=1 

In order to do so, it suffices to show that J(9\ b n ) —> J*{9) Pg-a.s. Since 
J(9; b) is jointly continuous and 9 n a strongly consistent estimator of 9, 
from Lemma 1 we have 

| J(9 n \ b n ) - J(9] b n )| < sup | J(9 n \ b) - J(9; b) | -)• 0 Pg - a.s. (3.19) 

beB 

Therefore, we only need to show that J(9 n \b n ) —> J*(9 ) P^-a.s. But from 
the definition of (b n ) in (3.3) we have that J(0 n _i; b n ) = J*(9 n _ i), therefore 
from the triangle inequality we obtain: 

| J(0 n \ b n ) - J*{0)\ < | J0 n - b n ) - J(9 n -P b n )| + | J*(^n-l) - J* (0) \ 

< sup I J(9 n - b) - J(4-i; b)| + I J*(0„_i) - J*(9 )|. 

beB 

Since 9 n is a strongly consistent estimator of 9, from Lemma 1 it follows 
that 

sup | J{9 n \ b) - J(9 n -\; b)| -a 0 Pg - a.s., 

beB 
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whereas from the continuity of J* we obtain J*(0n-i) —> J*(6 ) Pg — a.s., 
which completes the proof of (3.18). 

Now, from Proposition 1 we know that {5' n (0)} rig N is a martingale with 
bounded increments, mean 0 and predictable variation 4(0)- Then, due to 
(3.18), we can apply the Martingale Central Limit Theorem (see, e.g., [1], 
Ex. 35.19, p. 481) and obtain 


S n (0) 

/4(0) 


4(0,1). 


Using the Taylor expansion (3.10), we have 


I -^ y /Ijp){e n -e)^N{ o,i), 

where 9 n lies between 9 n and 6. But, from (3.8) it follows that 


4(0) 


Pg - a.s., 


thus, from an application of Slutsky’s theorem we obtain 


/U4)(0 n -0) 4(0,1). 


(3.20) 


Finally, from (3.20) and (3.18) we obtain (3.17), whereas from from (3.20) 
and (3.8) we obtain (3.16), which completes the proof. □ 


4. CAT with response revision 

In this section we consider the design of CAT when response revision is 
allowed. As before, we consider multiple-choice items that have m categories 
and we assume that the total number of items that will be administered is 
fixed and equal to n. However, at any time during the test the examinee 
can go back and revise (i.e., change) the answer to a previous item. The 
only restriction that we impose is that each item can be revised at most 
m — 2 times during the test. As a result, we now focus on items with m > 3 
categories, unlike the previous section where the case of binary items (m = 2) 
was also included. Moreover, due to the possibility of revisions, the total 
number of responses (first answers and revisions) that are observed, r n , is 
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random, even though the total number of administered items, n, is fixed. 
In any case, n < r n < (m — l)n, with the lower bound corresponding to 
the case of no revisions and the upper bound to the case that all items are 
revised as many times as possible. 


4-1. Setup 


In order to formulate the problem in more detail, suppose that at some point 
during the test we have collected t responses, let f t be the number of distinct 
items that have been administered and r* := t — ft the number of revisions. 
For each item * G {1, we denote g\ as the number of responses that 

correspond to this particular item. Since each item can be revised up to 
m — 2 times, we have 1 < g\ < m — 1. 

After completing the t th response, the examinee decides whether to revise 
one of the previous items or to proceed to a new item. Specifically, let Ct := 
{i G {1,..., ft} : g\ < m— 1} be the set of items that can still be revised. The 
decision of the examinee is then captured by the following random variable: 


dt := 



the t + 1 th response will correspond to a new item 
the t + 1 th response is a revision of item i G Ct 


with the understanding that dt = 0 when Ct = 0. Then, Q t := a (/m, dm) is 
the a-algebra that contains all information regarding the history of revisions, 
where for compactness we write f\g '■= (fi, ■ ■ ■, ft) and dm := (di,..., dt). 

Of course, we also observe the responses of the examinee during the test. 
For each item i G {1,... ,f t }, let A®_ 1 be the set of remaining categories 
just before the j th attempt on this particular item, where 1 < j < g\. Thus, 
Aq = {1,..., m} is the set of all categories and A*_ 1 is a random set for 
j > 1. Let X} be the response that corresponds to the j th attempt, so that 
Xj = k if category k is chosen on the j th attempt on item i, where k G A}_ 1 . 
Then, 

X? := a (X[. gi , 1 <i<f t ), where X^ := (X {,..., X^), 

is the c-algebra that captures the information from the observed responses 
and J~t '■= Qt V J-f the a-algebra that contains all the available information 
up to this time. 
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4-2. Modeling assumptions 


As in the case of the regular CAT that we considered in the previous section, 
we assume that the first response to each item is governed by the nominal 
response model, so that for every 1 < i < ft we have 

Pq{X{ = k | bi) := p k (0] bi), 1 <k<m, (4.1) 


where bj := (a^, • ■ •, ajm> Cj 2 , • ■ •, Cj m ) is a IB-valued vector that characterizes 
item i and satisfies (2.2)-(2.3) and pp{0\ bj) is the pmf of the nominal response 
model defined in (2.4). But we now further assume that revisions are also 
governed by the nominal response model, so that for every 2 < j < g\ we 
have 


Pe {X) = k 


XI: bi) 


Pk{0; b z 


£ 


he A: 


3 -1 


Ph(0\ b, 


k E A*_i, 


(4.2) 


where X\.- := (X {,..., Xj). Moreover, we assume, as in the previous section, 
that responses coming from different items are conditionally independent, 
so that 


XI 4 


1 < * < ft | di:t, bi;/ t ) = 


ft 


rc 

Z=1 


XI i 

1: 9t 


d 1: /■ b, 


(4.3) 


where for compactness we write bi ; j t := (bi,.... b/ t ). Finally, we addition¬ 
ally assume that the observed responses on any given item are conditionally 
independent of the time during the test at which they were given. In other 
words, for every 1 < i < ft we have 


9 t 

P e (X\. gl | d l:t , bj) = P e (X( | bj ) ll P 0 (Xj | X^_ 1; bj) . (4.4) 

3=2 

The above assumptions specify completely the probability in the left-hand 
side of (4.3). Specifically, (4.3) and (4.4) imply that 


Pe (X[ :gi , 1 < i < ft | di : t, bi:/ t ) 

ft 9\ 

= nPe (Xj |bj) n p e (XjlX^.bj) 

i =1 3=2 


(4.5) 
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and the probabilities in the right-hand side are determined by the nominal 
response model according to (4.1)-(4.2). On the other hand, we do not model 
the decision of the examinee whether to revise or not at each step, i.e., we 
do not specify P e (dig | While this probability may depend on 6 and 

provide useful information for the ability of the examinee, its specification is 
a rather difficult task. Nevertheless, the above assumptions will be sufficient 
for the design and analysis of CAT that allows for response revision. 

4-3. Problem formulation 

As we mentioned in the beginning of the section, the total number of ad¬ 
ministered items is fixed and will be denoted by n, as in the case of the 
regular CAT. However, due to the possibility of revision, the total number 
of responses will now be random and denoted by r n . Indeed, the test will 
stop when n items have been distributed and the examinee does not want 
to (or cannot) revise any more items. More formally, 

T n := min{t > 1 : f t = n and dt = 0}, 

which reveals that r n is a stopping time with respect to filtration {Gt}, and 
of course {Pt}- Note that, for every 1 < i < n — 1, 

Tj := min{f > 1 : ft = i and dt = 0}, 

is the time at which the (i + l) th item needs to be selected and its selection 
will depend on the available information up to this time. That is, we will now 
say that (bj-|_i)i<j< n _i is an item selection strategy if the parameter vector 
that characterizes the (i + l) th item, bj+i, is a B-valued, P n -measurable 
random vector. As in the case of the standard CAT, items need to be selected 
so that the accuracy of the final estimator of 6, 0 Tn , be maximized. As in 
the previous section, a reasonable approach is to select the items in order 
to maximize the Fisher information of the nominal response model at the 
current ability estimate. Thus, after each observation t until the end of the 
test, we need an Tj-measurable random variable, 0t, that will provide the 
current estimate for the ability parameter, 6. 
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4 . 4 . Adaptive ability estimation based on a partial likelihood 


Our estimate for 6 after the first t responses will be the maximizer of the 
conditional log-likelihood of the acquired observations given the selected 
items and the revision strategy of the examinee: 


£<(«) 


log p „ 


i = !,■■■, ft 


dl:t, 



(4.6) 


In order to lighten the notation, for every 2 < j < g% we will use the following 
notation 


Pk(0 ; b ? ; | Xlj^) := P e (X) = k \ X{ :j _ 1} b t ) , k € A)_ x (4.7) 

for the conditional probability that is determined in (4.1) and we will further 
use the following notation for the corresponding log-likelihood 


£ ( e ; b U X) = k I Xi^) := log p k ( 0 ■ b, I Xl^) , k G A)_ v 
Then from (4.5) we have 

ft <4 

= E [ l & x i) + E ^ b '-- v ;- v io 1) - ( 4 - 8 ) 


i =1 


3 =2 


where l{0 \ b i,X{) is defined according to (2.5) and the corresponding score 
function takes the form 


, ft 9t 

St(0) := -L t (0) = Y[ S (<9; bi,X{) + £ a (6; b i} -Yj .Y' :j ,) , (4.9) 


2—1 


3 =2 


where s(9; b i,X\) is defined according to (2.6) and for every 2 < j < g\ we 
have 

s(e- h u X) = k\ xi,^) := ( 0 ; bi, Xj = k\ xij^) 


E [aki - a{9 ; bjlXJ.^)) l {X j =fc} , k € ^j-i 




and a(6>; := ^ afci Pfc (#; b; | 
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Our estimate for 8 after the first t responses will be the root of St(9). As 
in the case of the regular CAT, this root will exist for every t > to, where 
to is some random time. Thus, for t < to we need an alternative estimating 
scheme. This, however, will not affect the asymptotic properties of our es¬ 
timator as the number of administered items, to, goes to infinity, which will 
be the focus on the remaining of this section. 


4-5. Asymptotic analysis 


Our asymptotic analysis will be based on the martingale property of the 
score function, St(8), which is established in the following proposition. 

Proposition 2. For any item selection strategy and any revision strategy, 
is a (P o,{Ft\t&{)-martingale with bounded increments, mean 
zero and predictable variation (S(6))t = where 

ft ft 9t 

I t (e):=^2mbi)+I t R (d), ( 4 - 10 ) 

i=l 1=1 j =2 

where is defined in (2.8) and 

:= Egls^e-b^X^Xl^)] 

= Y, (a fe -a(0;b J |XL i _ 1 )) 2 p fc (0;b J |XL i _ 1 ). 

k * A U 


Finally, for any 8 we have 


S' ( 8 ) 


lie s < (e) 


9=6 


-m. 


(4.11) 


Proof. After having completed the f — 1 th response, the examinee either 
proceeds with a new item or chooses to revise a previous item. Therefore, 
the difference St(d) — St— 1 ( 6 ) admits the following decomposition: 


s[ 8 -b ft ,x{ t )l {dt _ 1=0} + V s[9-,b h XlAXl , A (4.12) 


i&Ct -1 


9V 
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where the sum is null when Ct~\ = 0. Since dt-i, Ct -1 are T~i -measurable, 
taking conditional expectations with respect to Tt-i we obtain 

E„[S,(9) - = E e [s («; b f „x{'j | T t 

+ E E »K ( '; b <w’,|x;, 9 ,_ i ; 

iec t 

Since ft is J^-i-measurable, it follows that 


1 {dt-i=0} 


Tt -1 


1 


{d t -i=i}- 


ft 


s[9-,b ft ,X( 


Tt-1 


= 0 


and since g\ is also J-)_i-measurable , it follows that 


sl6;bi,X' i \X! * , 

l ’ *’ 9V T.g\-l 


Tt -1 


= 0 , 


which proves that St(0) is a zero-mean martingale with respect to (Pg, {Tt}t£n)- 
Now, taking squares in (4.12) we obtain 

E e [(S t (e) - St-^9)) 2 \T t -i] 

= J(8;b ft )l {dt _ 1=0} + x; J (0MK 9 i-l) 

i&Ct -1 

and consequently the predictable variation of St{8) will be 


(S(0)) t = J2 E o [($,(0) - ^-iW) 2 \T v -i 


V=1 

t 

E 

V = 1 

ft 

E 

2=1 




jeCo-i 


si 


J(0; bj) + X] J ^ b - h ) 


h =2 


= : It R - 


□ 


We can now establish the strong consistency of d Tn as n —> oo without 
any conditions on the item selection or the revision strategy. 

Theorem 4.1. For any item selection method and any revision strategy, as 
n -A oo we have 


0r 


8 and —> 1 P g-a.s. 

*T n ( 8 ) 


(4.13) 
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Proof. From Proposition 2 we have that St(0) is a (Pg, {J^D-martingale. 
Moreover, (r n ) ng pj is a strictly increasing sequence of (bounded) {J-)})- 
stopping times. Then, from an application of the Optional Sampling Theo¬ 
rem it follows that S Tn (d) is a (Pg, {Jv^D-martingale with predictable vari¬ 
ation I Tn (0). Moreover, from (4.10) we have I Tn {9) > nJ*(0) -A oo, therefore 
from the Martingale Strong Law of Large Number ([27], p. 124 ) it follows 
that 


Sr n (0) 

irM 


Pe — a.s. 


Then, by a Taylor expansion around 9 and (4.11) we have 


o = S Tn {e Tn ) = S Tn (9) + S' Tn (9 Tn )(9 Tn - 9) 
= s Tn (0)-irAe Tn )(9 Tn -9), 


(4.14) 


(4.15) 


where 9 Tn lies between 9 Tn and 9, and (4.14) takes the form 

(9 Tn - 9) ^ 0 Pq — a.s. 

-On (9) 

However, since r n < (to — l)n and J*(9)ft < It{9) < Kt for every t, where 
I\ is defined in (2.13), we have 


On (O n ) > n J*(0T n ) > 
IrM - T n K - 


1 


TO — 1 


J*0T „) 


and it suffices to show that 


lirnsup \9 Tn \ < oo Pq — a.s. 


(4.16) 


Now, for large n we have S Tn (9 Tn ) = 0 and (4.14) can be rewritten as follows 

SrM ~ S Tn {9 Tn ) 


O„(0) 


0 Pa — a.s. 


(4.17) 
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But from the definition of the score function in (4.9) it follows that 


Sr n (0)- 

-S Tn 0 T J 



n 


9r n 


ii 

M 

(s{9; hi) — s(0 Tn \ bj)^ 

+E( s ( 0 ; b i’ x i\xi.. j -i) 

-s{9 Tn -b h X)\X\._ 

i= 1 


3=2 


n 



~ 

ii 

M 

(a(0 Tn ;b i) -a(0;bj) 

)+^(d(0 rn ;b i |Xi :i _ 1 ) 

- a(e-,b i \xi :j - 1 j) 

i= 1 


3=2 

- 

> n inf [a(0 Tn ; b) - a(9\ b)] 
beB 




+ (r n -n) min min inf [a(0 T „; b | X\ :j _ x ) - c*(0; b|X| .J], 
2<J<m—1 jq j beB J J 

On the other hand, I Tn (0 ) < r n K, which implies that 


s r n (o) - s Tn (e Tn ) ^ 1 . fr _^ UN uVI 

- cm - a K b) - b)] 


1 . . . , 

+ — mm mm mf 

A '2<j<m— 1 beB 


a{0 Tn -b\X 1:j . 1 )-a(9-b\X 1:j _ 1 ) 


where := (A'i,... ,Xj— i) is a vector of j — 1 responses on an item 

with parameter b. Then, on the event {liiri sup n d Tn —> cx)} there exists a 
subsequence (0 r ) of (0 Tr J so that 0 T —> oo and, consequently, 


liminf inf 

n-j—>oo beB 


a0 Tn .; b) - a(0; b) 


> 0 


whereas for any 2 < j < m — 1 and we have 


liminf inf 


d(e Tn .;b|X 1:j _ 1 )-d(0;b|X 1:j _i) 


> 0. 


Therefore, we conclude that 


S T (0)-S T (0 T ) 

hnnnf --——^- 3 — > 0 

n j Irn< (°) 


and comparing with (4.17) we have that P(limsup n 0 Tn = oo) = 0. Similarly 
we can show that P(lim sup,, 0 Tn = —oo) = 0, which proves (4.16) and, 
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consequently, the strong consistency of d Tn as n —> oo. In order to prove the 
second claim of the theorem, we need to show that 

\^r n 0T n ) — Ir n (S)\ 

Cm (4 - 18) 

goes to 0 Pe-a.s. as n -A oo. But I Tn {9 ) > n J*(9), whereas \I Tn {pT„) | 

is bounded above by 


n 9 t 


E i j(e Tn ; bi) - m boi+E E \ j 0T n Mx\ :j -i) - m b, .v;., ,) 

i=l 3=2 


i =1 


< n sup \J{0 Tn ', b) - J(9; b)| 

beB 

+ (r n — n ) max max sup J(0 Tn ] b|Xi.,_i) — J(9 ; b|Xi.»_i) 

2<j<m— 1 bgB 


where again := (X \,..., X,_i) is a vector of j — 1 responses on an 

item with parameter b. Therefore, the ratio in (4.18) is bounded above by 


—-j-— sup \J(9t'i b) - J{0; b)| 

J*W) beB 
m — 2 

H— max max sup 

J*(0) 2<j<?n-l Xi ;J _i bgB 


J(e Tn ; b\x 1:j -i) 


J(0; b|X 1:i _!) 


But we can show as in Theorem 3.1 that 


sup | J(9 Tn ; b) - J(0; b)| -a 0 Pe - a.s. 
beB 


and, similarly, due to the strong consistency of 9 Tn and the continuity of 
9 -A J(9, b | X± : j-i), we can apply Lemma 1 and show that for every 2 < 
j < m — 1 we have 


sup 

beB 


J(9 Tn -b\X 1:j - 1 )~ J(0;b|Xly_l) 


Pe — a.s. 


This implies that (4.18) goes to 0 a.s. and completes the proof. ’§3 

While we established the strong consistency of 9 Tn without any conditions, 
its asymptotic normality requires certain conditions on the item selection 
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strategy and the number of revisions. Indeed, in order to apply the Martin¬ 
gale Central Limit theorem, as we did in the case of the regular CAT, we 
need to make sure that 

1 1 , n 1 

-IrJS) = - £ J(S, b.) + -/«(») (4.19) 

n n z — J n 

i= 1 

converges in probability, where I R is the part of the Fisher information due 
to revisions (recall (4.10)). If we select each item in order to maximize the 
Fisher information at the current estimate of the ability level, i.e., 


bj+i G argmax J(d T 
beB 


;b), * = 1 ,..., n — 1 , 


(4.20) 


where J(0\b) is the Fisher information of the nominal response model, de¬ 
fined in (2.8), then we can show as in the case of the regular CAT that 


1 

~y r(Q) P e-a.s. 

n z ' 


However, the item selection strategy does not control the second term in 
(4.19). Nevertheless, we can see that 


1 

n 




K (r n - n) 
n 


which implies that I R ( 0)/n will converge to 0 in probability as long as the 
number of revisions is small relative to the total number of items, in the 
sense that (r n — n)/n goes to 0 in probability, i.e., r n — n = o p (n). This is 
the content of the following theorem. 

Theorem 4.2. If I Tn (6)/n converges in probability, then 


y/lr n 0r n )(0r n -e)^Af(O,l). (4.21) 

This is true in particular when the information-maximizing item selection 
strategy (4.20) is used and the number of revisions is much smaller than the 
number of items, in the sense that r n — n = o v (n), in which case we have 


(4.22) 




Wang, Fellouris and Chang/Design for CAT that allows for response revision 26 


Proof. We will first show that if I Tn (0)/n converges in probability, then 

SrM 


, ' AA(0 ’ 1) - 

In order to do so, we define the martingale-difference array 
Ynt := g «( g) -£«- l(g) i{ ^ Tn}> t e N, nE N. 


(4.23) 


n 

Indeed, since {St(0)} is an {J^j-martingale and r n an {J^j-stopping time, 
then {t < r n } = {r n < t — 1} C E Ft -1 and, consequently, we have 

Eo[St{6) ~ S t -i(0) \ F t - 1] = 0. 


n 


Moreover, the increments of {5)(l9)} are uniformly bounded by I\, which 
implies that for every e > 0 we have 


Yl E di Y nt 1 {hn t |>£}] ->• 0 


(4.24) 


t =l 


as n —> oo. Therefore, from the Martingale Central Limit Theorem (see, e.g. 
Theorem 35.12 in [1] and Slutsky’s theorem it follows that if 

E E [Y* | F t - 1] = - E E [(S t (6) - S t . tid )) 2 | F t -i] = ^ 

t= l n t= l 

converges in probability to a positive number, then 

Tn 


n 


(4.25) 


I - -i ‘ n 




AA(0,1). 


If we now use the Taylor expansion (4.15), then the convergence (4.23) takes 
the form 

^ ^(°> !)> 

-^Tn (9) 

where 0 Tri lies between 0 Tn and 6. From the consistency of the estimator 
(4.13) it follows that the ratio in the left-hand side goes to 1 almost surely 
and from Slutsky’s theorem we obtain 

ViUf){e Tn -e)^N{ o,i). 
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From (4.13) and another application of Slutsky’s theorem we now obtain 
(4.21). Finally, the second part follows from the discussion that lead to 
Theorem 4.2. □ 

Therefore, the proposed design leads to the same asymptotic behavior as 
that in the regular CAT design, as long as the proportion of revisions is small 
relative to the number of distinct items. We expect that this is typically the 
case in practice, since most examinees tend to review and revise only a few 
items which they are not sure during the test process or at the end of the 
test. 

5. Simulation study 

We now present the results of a simulation study that illustrates the pro¬ 
posed design and our asymptotic results in a CAT with n = 50 items. We 
consider items with m = 3 categories, thus, each item can be revised at 
most once whenever revision is allowed. The parameters of the nominal re¬ 
sponse model are restricted in the following intervals 0,2 E [—0.18,4.15], 
as E [0.17,3.93], C2 E [—8.27,6.38] and C3 E [—7.00,8.24], whereas we 
set ai = ci = 0, which were selected based on a discrete item pool in 
Passos, Berger & Frans E. Tan [16]. The analysis was replicated for 6 in 
{-3,-2,-1,0,1,2,3}. 

With respect to the revision strategy, we assume that the examinee de¬ 
cides to revise the t th question with probability, pt- If we denote the total 
number of items which can be revised during the test as n\. then pt satisfies 
the following recursion: p t + 1 = Pt — 0.5/ni, pi = 0.5. For ni we considered 
the following possibilities: n\/n = 0.1,0.5,1. Moreover, we assumed that 
whenever the examinee decides to revise, each of the previous items that 
have not been revised yet are equally likely to be selected. 

For each of the above scenarios, we computed the root mean square error 
(RMSE) of the final estimation on the basis of 1000 simulation runs. The 
results are summarized in Table 1. Note that when revision is allowed, the 
design is denoted as RCAT. We observe that revision often improves the 
ability estimation, especially when the number of revisions is large. How¬ 
ever, the RMSE is typically larger than the square root of the asymptotic 
variance, y/nJ*(8). An exception seems to be the case that 9 = —2 with 
a large number of revisions. In order to understand this further, we plot 
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Table 1 

RMSE in CAT and RCAT 


e 

VnJ*(0) 

CAT 


RCAT 






Expected number 






of revisions 





4 

18 

26 

-3 

0.0985 

0.1042 

0.1051 

0.1068 

0.1001 

-2 

0.0713 

0.0746 

0.0731 

0.0701 

0.0700 

-1 

0.0681 

0.0716 

0.0724 

0.0718 

0.0714 

0 

0.0681 

0.0743 

0.0723 

0.0722 

0.0721 

1 

0.0683 

0.0773 

0.0716 

0.0699 

0.0704 

2 

0.0681 

0.0747 

0.0718 

0.0702 

0.0701 

3 

0.0710 

0.0787 

0.0756 

0.0728 

0.0721 


in Figure 1 the evolution of the total information It(9)/t (solid line with 
circles), the information from the first responses, Yli=i J^i)/ft (dashed 
line with squares), the information from revisions I R (9)/t (dashed line with 
diamonds), where 1 < t < r n and I R (0) is defined in (4.12). The horizontal 
line represents the asymptotic variance J*{9). Thus, we see that thanks to 
the contribution from a large number of revisions, it is possible to outper¬ 
form the best asymptotic performance that can be achieved in a standard 
CAT design. 

Finally, we plot in Figure 2 the “confidence intervals” that would be 
obtained after i items have been completed in the case of a standard CAT, 
as well as when revision is allowed (in the case that 9 = 3). Our asymptotic 
results suggests their validity for a large number of items and our graphs 
illustrate that revision seems to actually improve the estimation of 9. 


6. Conclusions 

In the first part of this work, we considered the design of CAT that is based 
on the nominal response model. Assuming conditional independence of the 
responses given the selected items and that the item parameters belong to a 
bounded set, we established the strong consistency of the MLE for any item 
selection strategy and its asymptotic efficiency when the items are selected 
to maximize the current level of Fisher information. It is interesting to note 
that in the special case of binary items (m = 2) the nominal response model 
reduces to the dichotomous 2PL model and, in this context, our results 
complement the ones that were obtained in [5] under the same model. Indeed, 
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The Decomposition of Fisher Information 



Fig 1: The solid line represents the evolution of the normalized Fisher information, 
that is {It{0t)/t, 1 < t < r„}, in a CAT with response revision. The dashed line 
with squares represents the information from the first responses and the dashed 
line with diamonds the information from revisions, according to the decomposition 


(4.12). The true ability value is 9 = —2. 


CAT,9=3 



10 20 30 40 50 


test length 


RCAT,0=3 



10 20 30 40 50 

test length 


Fig 2: The plot in the left-hand side presents the intervals 0; ± 1.96 • (A^)) 1 / 2 , 
1 < i < n in the case of the standard CAT. The plot in the right-hand side presents 
the intervals d Ti ± 1.96 • (J T . (0 T . ^Z 2 , 1 < i < n in the case where response revision 
is allowed. In both cases, the true value of 9 is 3. 
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here we assume that all item parameters belong to a bounded set, whereas 
in [5] it is assumed that the item difficulty parameter, b , is unbounded, a 
rather unrealistic assumption in practice where items are drawn from a given 
item bank. Moreover, we establish the strong consistency of the MLE for 
any item selection strategy , unlike [5] where this is done only when hi = . 

Finally, from a technical point of view, while the proofs in [5] are heavily 
based on this closed-form expression for the bf s, here we do not explicitly 
use this expression in our proofs (since it is not available in the general case 
of the nominal response model anyway). 

In the second part of this work, we proposed a novel CAT design in 
which response revision is allowed. We showed that the proposed estimator 
is strongly consistent and that it becomes asymptotically normal (with the 
same asymptotic variance as in the standard CAT) when items are selected 
to maximize the Fisher information at the current ability estimate and the 
number of revision is small relative to the total number of items. We further 
illustrated our theoretical results with a simulation study. 

From a policy point of view, our main message is that the nominal re¬ 
sponse model should be used for the design of CAT for two reasons. First, 
because it captures more information than dichotomous models which col¬ 
lapse all possible wrong answers of an item to one category. Second, because 
it can be used in a natural way to allow for response revision. In fact, one of 
the most appealing aspects of our approach is that it incorporates response 
revision without any additional calibration effort than the one needed by the 
standard CAT that is based on the nominal response model. 

Our work provides the first rigorous analysis of a CAT design in which 
response revision is allowed and it opens a number of research directions. 
First of all, items in reality are drawn without replacement from a finite 
pool. This may call for modifications of the item selection strategy in order 
to make the proposed scheme more robust (see, e.g., [3]). Moreover, more 
empirical work is required in order to understand the effect of response 
revision on the ability estimation, which can be much more substantial in 
practice than in the (idealistic) setup of our simulation study. 

While our approach is robust, in the sense that we do not explicitly model 
the decision of the examinee to revise or not at each step given the selected 
items, it may result in a loss of efficiency when the revision strategy depends 
on the ability of the examinee. Modeling this behavior is a challenge that 
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could be addressed as soon as CATs that allow for response revision begin 
to be implemented in practice and relevant data can be obtained. Finally, 
it remains an open problem to incorporate response revision in the case of 
binary items, where a dichotomous IRT model needs to be used and our 
approach cannot be applied. 
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